Maximizing Text-Mining Performance
نویسندگان
چکیده
WITH THE ADVENT OF CENTRALized data warehouses, where data might be stored as electronic documents or as text fields in databases, text mining has increased in importance and economic value. One important goal in text mining is automatic classification of electronic documents. Computer programs scan text in a document and apply a model that assigns the document to one or more prespecified topics. Researchers have used benchmark data, such as the Reuters-21578 test collection, to measure advances in automated text categorization. Conventional methods such as decision trees have had competitive, but not optimal, predictive performance. Using the Reuters collection, we show that adaptive resampling techniques can improve decision-tree performance and that relatively small, pooled local dictionaries are effective. We’ve applied these techniques to online banking applications to enhance automated e-mail routing.
منابع مشابه
Cut-off Grade Optimization for Maximizing the Output Rate
In the open-pit mining, one of the first decisions that must be made in production planning stage, after completing the design of final pit limits, is determining of the processing plant cut-off grade. Since this grade has an essential effect on operations, choosing the optimum cut-off grade is of considerable importance. Different goals may be used for determining optimum cut-off grade. One of...
متن کاملUnsupervised Phrasal Near-Synonym Generation from Text Corpora
Unsupervised discovery of synonymous phrases is useful in a variety of tasks ranging from text mining and search engines to semantic analysis and machine translation. This paper presents an unsupervised corpus-based conditional model: Near-Synonym System (NeSS) for finding phrasal synonyms and near synonyms that requires only a large monolingual corpus. The method is based on maximizing informa...
متن کاملConditional Information Bottleneck Clustering
We present an extension of the well-known information bottleneck framework, called conditional information bottleneck, which takes negative relevance information into account by maximizing a conditional mutual information score. This general approach can be utilized in a data mining context to extract relevant information that is at the same time novel relative to known properties or structures...
متن کاملارائه مدلی برای استخراج اطلاعات از مستندات متنی، مبتنی بر متنکاوی در حوزه یادگیری الکترونیکی
As computer networks become the backbones of science and economy, enormous quantities documents become available. So, for extracting useful information from textual data, text mining techniques have been used. Text Mining has become an important research area that discoveries unknown information, facts or new hypotheses by automatically extracting information from different written documents. T...
متن کاملA review of text mining approaches and their function in discovering and extracting a topic
Background and aim: Four text mining methods are examined and focused on understanding and identifying their properties and limitations in subject discovery. Methodology: The study is an analytical review of the literature of text mining and topic modeling. Findings: LSA could be used to classify specific and unique topics in documents that address only a single topic. The other three text min...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999